41 research outputs found

    Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

    Get PDF
    Background: While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict proteinprotein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results: PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions: PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online a

    Fitness Tradeoffs of Antibiotic Resistance in Extraintestinal Pathogenic Escherichia coli

    Get PDF
    Evolutionary trade-offs occur when selection on one trait has detrimental effects on other traits. In pathogenic microbes, it has been hypothesized that antibiotic resistance trades off with fitness in the absence of antibiotic. Although studies of single resistance mutations support this hypothesis, it is unclear whether trade-offs are maintained over time, due to compensatory evolution and broader effects of genetic background. Here, we leverage natural variation in 39 extraintestinal clinical isolates of Escherichia coli to assess trade-offs between growth rates and resistance to fluoroquinolone and cephalosporin antibiotics. Whole-genome sequencing identifies a broad range of clinically relevant resistance determinants in these strains. We find evidence for a negative correlation between growth rate and antibiotic resistance, consistent with a persistent trade-off bet

    The Cluster Editing Problem: Implementations And Experiments

    No full text
    In this paper, we study the cluster editing problem which is fixed parameter tractable. We presen

    MP-PIPE: A massively parallel protein-protein interaction prediction engine

    No full text
    Interactions among proteins are essential to many biological functions in living cells but experimentally detected interactions represent only a small fraction of the real interaction network. Computational protein interaction prediction methods have become important to augment the experimental methods; in particular sequence based prediction methods that do not require additional data such as homologous sequences or 3D structure information which are often not available. Our Protein Interaction Prediction Engine (PIPE) method falls into this category. Park has recently compared PIPE with the other competing methods and concluded that our method "significantly outperforms the others in terms of recall-precision across both the yeast and human data". Here, we present MP-PIPE, a new massively parallel PIPE implementation for large scale, high throughput protein interaction prediction. MP-PIPE enabled us to perform the first ever complete scan of the entire human protein interaction network; a massively parallel computational experiment which took three months of full time 24/7 computation on a dedicated SUN UltraSparc T2+ based cluster with 50 nodes, 800 processor cores and 6,400 hardware supported threads. The implications for the understanding of human cell function will be significant as biologists are starting to analyze the 130,470 new protein interactions and possible new pathways in Human cells predicted by MP-PIPE

    Fast and scalable protein motif sequence clustering based on Hadoop framework

    No full text
    In recent years, we are faced with large amounts of sporadic unstructured data on the web. With the explosive growth of such data, there is a growing need for effective methods such as clustering to analyze and extract information. Biological data forms an important part of unstructured data on the web. Protein sequence databases are considered as a primary source of biological data. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed of data processing and analysis. Proteins are responsible for most of the activities in cells. The majority of proteins show their function through interaction with other proteins. Hence, prediction of protein interactions is an important research area in the biomedical sciences. Motifs are fragments frequently occurred in protein sequences. A well- known method to specify the protein interaction is based on motif Clustering. Existing works on motif clustering methods share the problem of limitation in the number of clusters. However, regarding the vast amount of motifs and the necessity of a large number of clusters, it seems that an efficient, scalable and fast method is necessary to cluster such large number of sequences. In this paper, we propose a novel approach to cluster a large number of motifs. Our approach includes extracting motifs within protein sequences, feature selection, preprocessing, dimension reduction and utilizing BigFCM (a large-scale fuzzy clustering) on several distributed nodes with Hadoop framework to take the advantage of MapReduce Programming. Experimental Results show very good Performance of our approach

    Computational approaches toward the design of pools for the in vitro selection of complex aptamers

    No full text
    It is well known that using random RNA/DNA sequences for SELEX experiments will generally yield low-complexity structures. Early experimental results suggest that having a structurally diverse library, which, for instance, includes high-order junctions, may prove useful in finding new functional motifs. Here, we develop two computational methods to generate sequences that exhibit higher structural complexity and can be used to increase the overall structural diversity of initial pools for in vitro selection experiments. Random Filtering selectively increases the number of five-way junctions in RNA/DNA pools, and Genetic Filtering designs RNA/DNA pools to a specified structure distribution, whether uniform or otherwise. We show that using our computationally designed DNA pool greatly improves access to highly complex sequence structures for SELEX experiments (without losing our ability to select for common one-way and two-way junction sequences)
    corecore